Data Quality Assessment Report

massqc from tidymass by Xiaotao Shen

2022-03-06


INTRODUCTION

massqc (version 0.01): Created in 2021 by Xiaotao Shen


PARAMETERS

Table 1: Parameter setting

pacakge_name function_name parameter time
massprocesser process_data path:E:/2019_CRC/20190109_CRC_RPLC50mm_neg_XCMS/tidymass 2022-03-01 09:03:10
massprocesser process_data polarity:negative 2022-03-01 09:03:10
massprocesser process_data ppm:20 2022-03-01 09:03:10
massprocesser process_data peakwidth:5,30 2022-03-01 09:03:10
massprocesser process_data snthresh:10 2022-03-01 09:03:10
massprocesser process_data prefilter:3,500 2022-03-01 09:03:10
massprocesser process_data fitgauss:FALSE 2022-03-01 09:03:10
massprocesser process_data integrate:2 2022-03-01 09:03:10
massprocesser process_data mzdiff:0.01 2022-03-01 09:03:10
massprocesser process_data noise:500 2022-03-01 09:03:10
massprocesser process_data threads:6 2022-03-01 09:03:10
massprocesser process_data binSize:0.025 2022-03-01 09:03:10
massprocesser process_data bw:5 2022-03-01 09:03:10
massprocesser process_data output_tic:FALSE 2022-03-01 09:03:10
massprocesser process_data output_bpc:FALSE 2022-03-01 09:03:10
massprocesser process_data output_rt_correction_plot:FALSE 2022-03-01 09:03:10
massprocesser process_data min_fraction:0.5 2022-03-01 09:03:10
massprocesser process_data fill_peaks:FALSE 2022-03-01 09:03:10
massdataset create_mass_dataset() no:no 2022-03-01 09:51:34
massdataset mutate() parameter_1:batch=as.character(batch) 2022-03-05 16:53:03
massdataset mutate_variable_na_freq() according_to_samples:QC01,QC02,QC03,QC04,QC05,… 2022-03-05 17:04:03
massdataset mutate_variable_na_freq() according_to_samples:men_normal_166,men_normal_2332,men_normal_2357,men_normal_2371,men_normal_2407,… 2022-03-05 17:04:03
massdataset mutate_variable_na_freq() according_to_samples:menLCCstage1_1122,menLCCstage1_1487,menLCCstage1_1532,menLCCstage1_1941,menLCCstage1_1969,… 2022-03-05 17:04:03
massdataset filter() parameter:~na_freq < 0.2 & (na_freq.1 < 0.5 &#124; na_freq.2 < 0.5) 2022-03-05 17:04:15
massdataset mutate() parameter_1:class=case_when(class == “QC” ~ class, TRUE ~ “Subject”) 2022-03-05 17:10:13
masscleaner impute_mv() method:knn 2022-03-05 17:46:18
masscleaner impute_mv() rowmax:0.5 2022-03-05 17:46:18
masscleaner impute_mv() colmax:0.8 2022-03-05 17:46:18
masscleaner impute_mv() maxp:1500 2022-03-05 17:46:18
masscleaner impute_mv() rng.seed:362436069 2022-03-05 17:46:18
masscleaner impute_mv() sample_id:men_normal_166,men_normal_2332,men_normal_2357,men_normal_2371,men_normal_2407,… 2022-03-05 17:46:18
masscleaner normalize_data() method:svr 2022-03-05 19:56:37
masscleaner normalize_data() keep_scale:TRUE 2022-03-05 19:56:37
masscleaner normalize_data() multiple:1 2022-03-05 19:56:37
masscleaner normalize_data() threads:4 2022-03-05 19:56:37

SAMPLE INFORMATION

#> -------------------- 
#> massdataset version: 0.99.2 
#> -------------------- 
#> 1.expression_data:[ 2719 x 299 data.frame]
#> 2.sample_info:[ 299 x 6 data.frame]
#> 3.variable_info:[ 2719 x 6 data.frame]
#> 4.sample_info_note:[ 6 x 2 data.frame]
#> 5.variable_info_note:[ 6 x 2 data.frame]
#> 6.ms2_data:[ 0 variables x 0 MS2 spectra]
#> -------------------- 
#> Processing information (extract_process_info())
#> create_mass_dataset ---------- 
#>       Package         Function.used                Time
#> 1 massdataset create_mass_dataset() 2022-03-01 09:51:34
#> process_data ---------- 
#>         Package Function.used                Time
#> 1 massprocesser  process_data 2022-03-01 09:03:10
#> mutate ---------- 
#>       Package Function.used                Time
#> 1 massdataset      mutate() 2022-03-05 16:53:03
#> 2 massdataset      mutate() 2022-03-05 17:10:13
#> mutate_variable_na_freq ---------- 
#>       Package             Function.used                Time
#> 1 massdataset mutate_variable_na_freq() 2022-03-05 17:04:03
#> 2 massdataset mutate_variable_na_freq() 2022-03-05 17:04:03
#> 3 massdataset mutate_variable_na_freq() 2022-03-05 17:04:03
#> filter ---------- 
#>       Package Function.used                Time
#> 1 massdataset      filter() 2022-03-05 17:04:15
#> impute_mv ---------- 
#>       Package Function.used                Time
#> 1 masscleaner   impute_mv() 2022-03-05 17:46:18
#> normalize_data ---------- 
#>       Package    Function.used                Time
#> 1 masscleaner normalize_data() 2022-03-05 19:56:37

Figure 1: Peak intensity profile.


MISSING VALUES


MISSING VALUES IN DATASET

Black is MV.

Figure 2: Missing values in dataset


MISSING VALUES IN VARIABLES

Figure 3: Missing values in variables


MISSING VALUES IN SAMPLES

Figure 4: Missing values in samples


RSD DISTRIBUTATION

Figure 5: RSD distributation


INTENSITY FOR ALL THE VARIABLES

Figure 6: Intensity for all the variables


SAMPLE CORRELATION

Figure 7: Sample correlation


PCA score plot

Figure 7: PCA score plot